Confidence Intervals

Dr. Lucy D’Agostino McGowan

Confidence Intervals for Individual Coefficients

The Basic Formula

For coefficient \(\beta_j\):
\[\hat{\beta}_j \pm t_{\alpha/2, n-p} \times \text{se}(\hat{\beta}_j)\]

Where:
- \(t_{\alpha/2, n-p}\) is the critical value from t-distribution
- \(\text{se}(\hat{\beta}_j)\) is the standard error
- \(n - p\) are the degrees of freedom

Where Does This Come From?

\(\hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2(\mathbf{X}^T\mathbf{X})^{-1})\)

For individual coefficient:
\[\hat{\beta}_j \sim N(\beta_j, \sigma^2[(\mathbf{X}^T\mathbf{X})^{-1}]_{jj})\]

Where Does This Come From?

\(\hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2(\mathbf{X}^T\mathbf{X})^{-1})\)

Standardize:
\[\frac{\hat{\beta}_j - \beta_j}{\sigma\sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}]_{jj}}} \sim N(0, 1)\]

The t-Distribution Connection

Problem: We don’t know \(\sigma\), so we estimate it with \(\hat{\sigma}\)

Result: Replace \(\sigma\) with \(\hat{\sigma}\) and the distribution changes
\[\frac{\hat{\beta}_j - \beta_j}{\hat{\sigma}\sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}]_{jj}}} \sim t_{n-p}\]

Standard Error Calculation

\[\text{se}(\hat{\beta}_j) = \hat{\sigma}\sqrt{[(\mathbf{X}^T\mathbf{X})^{-1}]_{jj}}\]

Where:
- \(\hat{\sigma}^2 = \frac{\text{SSE}}{n-p}\) (MSE)
- \([(\mathbf{X}^T\mathbf{X})^{-1}]_{jj}\) is the \(j\)-th diagonal element

You Try: Calculate Standard Error

Given:
\[(\mathbf{X}^T\mathbf{X})^{-1} = \begin{bmatrix} 0.25 & -0.10 \\ -0.10 & 0.05 \end{bmatrix}\] \[\hat{\sigma}^2 = 16\]

Find: \(\text{se}(\hat{\beta}_1)\)

02:00

Constructing the 95% CI

Formula:
\[\text{CI}_{95\%} = \hat{\beta}_j \pm t_{0.025, n-p} \times \text{se}(\hat{\beta}_j)\]

Example: CI Calculation

Given: \(\hat{\beta}_1 = 3.2\), \(\text{se}(\hat{\beta}_1) = 0.8\), \(n = 30\), \(p = 3\)

Step 1: Find critical value
\(t_{0.025, 27} = 2.052\) (qt(0.975, 27))

Step 2: Construct interval
\[\text{CI}_{95\%} = [3.2 - 2.052\times0.8, 3.2 + 2.052\times0.8] = [1.6, 4.8]\]

Interpreting Confidence Intervals

What this means:
If we repeated this procedure many times with different samples from the same population, 95% of the intervals would contain the true \(\beta_1\)

What we CANNOT say:
“There is a 95% probability that \(\beta_1\) is in this interval”
(The parameter is fixed, the interval is random!)

DEMO

You Try: Build a Confidence Interval

Given:
- \(\hat{\beta}_2 = 5.6\)
- \(\text{se}(\hat{\beta}_2) = 1.2\)
- \(n = 25\), \(p = 4\)
- \(t_{0.025, 21} = 2.08\)

Find: 95% confidence interval for \(\hat\beta_2\)

03:00

Computing CIs in R

Let’s do it in R

x <- c(1, 2, 3, 4, 5)
y <- c(2.1, 3.9, 6.2, 7.8, 10.1)
model <- lm(y ~ x)

confint(model)
                 2.5 %    97.5 %
(Intercept) -0.5803601 0.6803601
x            1.7999393 2.1800607
# 90% confidence intervals
confint(model, level = 0.90)
                   5 %      95 %
(Intercept) -0.4161403 0.5161403
x            1.8494534 2.1305466

Manual Calculation in R

beta_hat <- coef(model)
se_beta <- summary(model)$coefficients[, "Std. Error"]
n <- length(y)
p <- length(beta_hat)
df <- n - p

t_crit <- qt(0.975, df)  

lower <- beta_hat - t_crit * se_beta
upper <- beta_hat + t_crit * se_beta

cbind(lower, upper)
                 lower     upper
(Intercept) -0.5803601 0.6803601
x            1.7999393 2.1800607

Your turn

Using the data below fit a model and get 99% confidence intervals.

# Data
x1 <- c(1, 2, 3, 4, 5, 6)
x2 <- c(2, 3, 3, 5, 6, 7)
y <- c(3, 5, 6, 8, 10, 11)

# Your turn: fit model and get 99% CIs
03:00

Bootstrap Confidence Intervals

What is the Bootstrap?

We can use the data to understand its own variability

How it works:
1. Resample your data (with replacement)
2. Fit model to resampled data
3. Record \(\hat{\beta}\)
4. Repeat many times (typically 1000+)
5. Use distribution of \(\hat{\beta}\) values to build CI

Types of Bootstrap CIs

1. Percentile method Take quantiles of bootstrap distribution

2. Bootstrap t
Take the SE from bootstrap distribution and use a critical t-value to construct an interval

3. Bias-corrected and accelerated (BCa)
Adjusts for bias and skewness (most accurate)

Percentile Bootstrap Method

Algorithm:
1. Generate B bootstrap samples (e.g., B = 1000)
2. For each sample, compute \(\hat{\beta}_j^{(b)}\)
3. Sort all B values
4. Take 2.5th and 97.5th percentiles for 95% CI

CI formula:
\[\text{CI}_{95\%} = [Q_{0.025}, Q_{0.975}]\] where \(Q_\alpha\) is the \(\alpha\)-quantile of bootstrap distribution

Bootstrap Algorithm Detail

For each bootstrap iteration \(b = 1, \ldots, B\):

  1. Resample indices: \(i_1^{(b)}, \ldots, i_n^{(b)}\) from \(\{1, \ldots, n\}\) with replacement
  1. Create bootstrap sample: \(\mathbf{y}^{(b)} = (y_{i_1^{(b)}}, \ldots, y_{i_n^{(b)}})^T\)
    \(\mathbf{X}^{(b)} = (\mathbf{x}_{i_1^{(b)}}, \ldots, \mathbf{x}_{i_n^{(b)}})^T\)
  1. Fit model: \(\hat{\boldsymbol{\beta}}^{(b)} = ((\mathbf{X}^{(b)})^T\mathbf{X}^{(b)})^{-1}(\mathbf{X}^{(b)})^T\mathbf{y}^{(b)}\)

Implementing Bootstrap in R

set.seed(1)
x <- 1:20
y <- 2 + 3*x + rnorm(20, 0, 5)
data <- data.frame(x, y)

model <- lm(y ~ x, data)
beta_hat <- coef(model)

B <- 1000

bootstrap_one <- function(data) {
  boot_data <- data[sample(1:nrow(data), replace = TRUE), ]
  boot_model <- lm(y ~ x, data = boot_data)
  boot_beta_hat <- coef(boot_model)
  return(boot_beta_hat)
}

boot_betas <- purrr::map_df(1:B, ~bootstrap_one(data)) 
boot_betas
# A tibble: 1,000 × 2
   `(Intercept)`     x
           <dbl> <dbl>
 1        -2.01   3.28
 2         2.64   3.02
 3         0.256  3.27
 4        -0.131  3.08
 5         2.07   3.17
 6         1.60   3.11
 7         0.584  3.18
 8         4.55   2.80
 9         4.14   2.86
10         2.25   3.01
# ℹ 990 more rows

Bootstrap CIs

boot_ci_intercept <- quantile(boot_betas$`(Intercept)`, c(0.025, 0.975))
boot_ci_slope <- quantile(boot_betas$x, c(0.025, 0.975))

cat("Bootstrap CI for intercept:", round(boot_ci_intercept, 3), "\n")
Bootstrap CI for intercept: -1.261 5.968 
cat("Bootstrap CI for slope:", round(boot_ci_slope, 3), "\n")
Bootstrap CI for slope: 2.755 3.351 
# Compare to traditional CI
confint(model)
                2.5 %   97.5 %
(Intercept) -2.714035 6.353106
x            2.729458 3.486368

Visualizing Bootstrap Distribution

Your Turn

Compute a bootstrap 90% CI for the slope

# Use this data
x <- c(1, 2, 3, 4, 5, 6, 7, 8)
y <- c(2.1, 3.8, 6.2, 7.9, 9.8, 12.1, 14.3, 15.8)
05:00

Coverage Probability

What is Coverage?

Definition: The proportion of confidence intervals that contain the true parameter

Ideal coverage: Should match confidence level
- 95% CI should have 95% coverage
- 90% CI should have 90% coverage

In practice: Coverage can be lower due to:
- Violated assumptions
- Small sample sizes
- Model misspecification

Checking Coverage via Simulation

The procedure:
1. Choose true parameter values
2. Generate data from the model
3. Compute confidence interval
4. Check if true value is inside
5. Repeat many times
6. Calculate proportion containing truth

Coverage Simulation Example

set.seed(1)
n_sims <- 1000

sim <- function(true_beta0 = 2, true_beta1 = 1, sigma = 5, n = 30) {
  # Generate data
  x <- runif(n, 0, 10)
  y <- true_beta0 + true_beta1 * x + rnorm(n, 0, sigma)
  
  # Fit model and get CI
  model <- lm(y ~ x)
  ci <- confint(model, level = 0.95)[2, ]  # CI for slope
  
  # Check coverage
  ci[1] <= true_beta1 && true_beta1 <= ci[2]
}

coverage <- purrr::map_lgl(1:n_sims, ~sim())
mean(coverage)
[1] 0.957

Interpreting Coverage Results

Our result: ~95% coverage (as expected!)

If coverage is too low (<95%):
- Standard errors may be underestimated
- Assumptions may be violated
- Consider robust methods or bootstrap

If coverage is too high (>95%):
- CIs are conservative (too wide)
- Losing power
- May be okay, but inefficient

Factors Affecting Coverage

Sample size:
- Small n can have poor coverage
- Large n coverage approaches nominal level

Assumption violations:
- Non-normality (less important with large n)
- Heteroscedasticity - Outliers

Model misspecification:
- Wrong functional form
- Missing important variables